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We discuss irreducible statistical limitations of future ton-scale dark matter direct detection ex- 
periments. We focus in particular on the coverage of confidence intervals, which quantifies the 
reliability of the statistical method used to reconstruct the dark matter parameters, and the bias 
of the reconstructed parameters. We study 36 benchmark dark matter models within the reach 
of upcoming ton-scale experiments. We find that approximate confidence intervals from a profile- 
likelihood analysis exactly cover or over-cover the true values of the WIMP parameters, and hence 
are conservative. We evaluate the probability that unavoidable statistical fluctuations in the data 
might lead to a biased reconstruction of the dark matter parameters, or large uncertainties on the 
reconstructed parameter values. We show that this probability can be surprisingly large, even for 
benchmark models leading to a large event rate of order a hundred counts. We find that combining 
data sets from two different targets leads to improved coverage properties, as well as a substantial 
reduction of statistical bias and uncertainty on the dark matter parameters. 



I. INTRODUCTION 

Among the large number of possible dark matter candi- 
dates [TH3], weakly- interacting massive particles (WIMP) 
[5] are by far the most widely studied. WIMPs naturally 
arise from popular extensions of the Standard Model of 
particle physics (e.g., the lightest neutralino in super- 
symmetry [6] [7] and the B 1 in theories with universal 
extra dimensions IHlfTO]). and they naturally achieve the 
appropriate cosmological relic density through thermal 
freeze-out in the early Universe. 

Several experiments are currently searching for these 
particles by looking for signals of WIMPs scattering 
on atomic nuclei in large underground detectors, and 
many others are planned for the next decade (see e.g. 
Ref. [1 and the discussion in Ref. pj]). Although the 
DAMA/LIBRA [12 and CoGeNT [13] collaborations 
have reported a modulation of the measured event rate 
that has been tentatively interpreted in terms of WIMPs 
(e.g. [14 J, and the CRESST-II collaboration has found 
a large excess of events in the acceptance region where 
a WIMP signal would be expected [15], these results 
can hardly be reconciled with null searches from exper- 
iments such as XENON100 QS1HZI, CDMS-II [HHH], 
EDELWEISS-II [20] and ZEPLIN-III [21]. The contro- 
versy will hopefully be resolved by next-generation di- 
rect detection experiments, where larger rates and better 
statistics could lead to an incontrovertible discovery of 
dark matter. 

If a WIMP-nucleon scattering signal is detected, the 
event rate and the shape of the measured spectrum of 
recoil energies can be used to determine the properties 
of the dark-matter particle, most importantly its mass 
and scattering cross-section. The constraining power of 



present and upcoming experiments has been thoroughly 
discussed in the literature [TT j [22H25] . Here, we present 
irreducible statistical limitations of future dark matter 
direct detection experiments. 

We focus on two different issues: first, we explore the 
concept of coverage of confidence intervals, which quan- 
tifies the reliability of the statistical method adopted to 
reconstruct the WIMP parameters. We investigate the 
coverage of one-dimensional confidence intervals, con- 
structed using an approximate method that relies on the 
assumption that profile likelihood ratios are chi-square 
distributed, based on Wilks' theorem [26] . This approxi- 
mate method of constructing confidence intervals is com- 
monly used for frequentist data analysis in the litera- 
ture in lieu of more complex methods (e.g. Feldman and 
Cousins [27] ) , which provide exact coverage by construc- 
tion. The coverage of parameter reconstructions has been 
previously discussed in the context of direct detection [28] 
and collider identification (29] of supersymmetric models. 

Second, we consider how well one can expect to recon- 
struct the WIMP properties from future direct-detection 
data, given the statistical fluctuations that will inevitably 
impact the observed energy spectrum. We perform pa- 
rameter reconstructions on thousands of simulated data 
sets to estimate the average uncertainty and bias in the 
reconstructions of several different WIMP benchmark 
models. Additionally, we provide an estimate of the num- 
ber of outliers in the parameter reconstructions. We 
show that for several different benchmark models that 
lead to small average uncertainties in the parameter re- 
construction, a non-negligible percentage of all recon- 
structions results in a much larger uncertainty, as a result 
of statistical fluctuations that impact on each individual 
data set. Considering the number of outliers for differ- 
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ent WIMP benchmark models is of crucial importance, 
since in practice there will be a unique realisation of each 
experiment, and the constraints derived from a particu- 
lar realisation can be very different from the outcome 
for the "average experiment", as illustrated below. Fi- 
nally, we investigate how the average uncertainty in the 
WIMP mass can be decreased by increasing the expo- 
sure of direct detection experiments, for several different 
benchmark points in WIMP parameter space. 

The complementarity between direct detection experi- 
ments using different target materials, and the possibility 
of obtaining tighter constraints on the WIMP parameters 
when combining data from more than one experiment, 
have recently been emphasized in Ref. [TTJ [2H [30j [31] . 
Here we compare the coverage, uncertainty and bias of 
reconstructed parameters for various benchmark points, 
based either on mock data sets from a single xenon ex- 
periment, or a combined analysis of mock data from a 
xenon experiment and a germanium experiment. 

Throughout our analysis we assume that the back- 
ground event rate is negligible, and ignore uncertainties 
in the nuclear physics of elastic scattering and the local 
WIMP distribution function. We expect that the cover- 
age, precision and bias of our reconstructions will degrade 
if the backgrounds are non-negligible and astrophysical 
uncertainties are fully taken into account. Given this op- 
timistic set-up, we present here a set of irreducible limi- 
tations on WIMP parameter reconstruction from future 
direct-detection experiments, arising from fundamental 
statistical fluctuations driven by the Poisson nature of 
the event rate. 

The paper is organized as follows: in Sec. [XT] we in- 
troduce the formalism of direct dark matter detection 
and discuss the expected performance of upcoming ex- 
periments. In Sec. |III| we present our parameter recon- 
struction method and introduce the statistical quantities 
we use to quantify the performance of our reconstruction 
procedure. We present our results in Sec. 
conclusions in Sec. [Vl 



II. DIRECT DARK MATTER DETECTION 

A. Theoretical formalism 

Dark matter direct detection experiments aim to de- 
tect signals of WIMPs scattering on target nuclei. The 
nuclear recoil spectrum for a WIMP of mass m x and a 
target nucleus of mass tun has the form 
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Here dR/dE R has units of events per unit energy per unit 
time per unit target material mass, po is the local dark 
matter density, a is the WIMP-nucleus scattering cross- 
section and Er is the WIMP-induced recoil energy of the 
nucleus. Neglecting gravitational focusing of WIMPs as 



they flow into the potential well of the Solar System, f(u) 
is the normalized local WIMP velocity distribution func- 
tion in the rest frame of the Galaxy, v~e is the Earth's 
velocity in this frame and v is the velocity of the WIMPs 
in the rest frame of the Earth (which is also the WIMP- 
nucleon relative velocity, as to a good approximation the 
nucleons are at rest in the Earth frame). In this paper 
we focus on elastic WIMP-nucleus interactions. For elas- 
tic scattering the minimum velocity v m i n required for a 
WIMP of mass m x to be able to induce a nuclear recoil 
of energy E R is 



\m N E R 



(2) 



where ji^ = m x m^ j (m x + tun) is the WIMP-nucleus 
reduced mass. 

The differential scattering cross-section da/dE R in- 
cludes different types of WIMP-nucleus interactions. We 
will assume that all events result from spin-independent 
WIMP-nucleus scattering and neglect all other types of 
interactions. In this case the differential scattering cross- 
section is given by 



da 
dE~R 



m N 



(3) 



where ^(Er) is the spin-independent nuclear form factor, 
which accounts for the finite extent and composite nature 
of the atomic nucleus, and af/ is the spin-independent 
(SI) zero-momentum WIMP-nucleus cross-section. This 
cross-section can be written in terms of the mass number 
of the nucleon A, its atomic number Z, the WIMP-proton 
coupling / p , and the WIMP-neutron coupling f n . 
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In the following we will assume that the WIMP-proton 
and WIMP-neutron couplings are very similar f p ~ f n 
(as appropriate in most supersymmetric setups [32 , but 
see also Refs. [33H36] for alternative scenarios), so that 
the WIMP-nucleus cross-section simplifies to af/ = 
^fijyA 2 f p /tt. In analogy to this expression we define the 
WIMP-proton cross-section a p T = 4/i^/^/7r, with n P = 
m x m p / (m x + m p ) the WIMP-proton reduced mass. The 
differential scattering cross-section can then be rewritten 
as 
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In this analysis we use the Helm form factor [37] 
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(5) 



(6) 



where q = ^/2rriNE R is the momentum transferred in the 
recoil, s = 0.9 fm, r = \/ c 2 + 77r 2 a 2 /3 — 5s 2 , a = 0.52 fm 
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and c = (1/23A 1 / 3 - 0.6) fm. Using Eq. (|5) the nuclear 
recoil spectrum can be rewritten as 



dR Po a^A 2 T 2 (E R ) 
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(7) 

The quantities of interest are the WIMP mass m x and the 
spin-independent WIMP-proton cross-section a^ 1 . The 
choice of target material enters the analysis via the mass 
number A and the form factor F(Er), and through v m i n . 
Note for m x ^> raw, v m in —> \/ER/2rriN, and hence the 
recoil spectrum depends on m x and a^ 1 only via the 
degenerate combination a^ 1 /(/i;?ra x ), which has a strong 
impact on the performance of the reconstruction of the 
WIMP properties, as we will see in the following sections. 

The third component that enters the recoil rate is the 
local astrophysical DM distribution, most importantly 
the local density po and the WIMP velocity distribution 
f(u). In this analysis we will model local astrophysics 
using the standard halo model. This model consists of 
an isothermal, spherically symmetric galactic WIMP dis- 
tribution. In this model, WIMP velocities follow a non- 
rotating isotropic Maxwellian distribution in a Galacto- 
centric frame with a one-dimensional velocity dispersion 
vo/V% where is the speed of the Local Standard of 
Rest. WIMPs traveling at very high velocities will es- 
cape the gravitational attraction of the galaxy and will 
therefore not be present in the halo. This is taken into 
account by truncating the velocity distribution at some 
escape velocity v esci leading to a WIMP velocity distri- 
bution function 



f{v + v~h) 







for \v + v~e\ < v esc 
otherwise , 

(8) 

with N = eTf(v esc /v )-27r- 1 ^(v esc /v )e-^^ v ^ 2 a nor- 
malization factor which ensures that f d 3 u f(u) = 1. The 
velocity of the Earth with respect to the rest frame of the 
galaxy is given by the sum of the local circular velocity 
vq, the Sun's peculiar velocity Vp ec and the Earth's ve- 
locity relative to the Sun 

V~E = V0 + Vpec + Vorb ■ (9) 

The contribution of both \vp ec \ ~ 10 km/s and ~ 
30 km/s to v~e is small compared to the contribution of 
vq ~ 200 — 300 km/s. As we consider neither directional 
signatures nor the annual modulation of the nuclear recoil 
spectrum in this study, the latter two terms in Eq. ([9| 
can be neglected and v~e — vb- 

It is well known that there is a sizeable uncertainty on 
the astrophysical parameters po,vo,v esc and f(u). Addi- 
tionally, the standard halo model can only be considered 
a first approximation to a much more complicated halo 
profile [38H4T] . In order to achieve a correct reconstruc- 
tion of the WIMP parameters from experiment, it is of 
vital importance to take into account these uncertain- 
ties [23H25] . The aim of this paper is to investigate the 



coverage properties and the quality of the reconstruc- 
tion for different WIMP benchmark models and iden- 
tify any unavoidable statistical effects. In order to do 
so we will assume an ideal case, fixing all of the astro- 
physical parameters to their fiducial values and neglect- 
ing their uncertainties. The fiducial values we use are 
po = 0.4 GeV/cm 3 , vo = 230 km/s and v esc = 544 km/s. 
We will investigate coverage properties of a more gen- 
eral framework that includes astrophysical uncertainties 
in the WIMP distribution function in a future work. 

The total number of recoil events Nr can be found by 
weighting the nuclear recoil rate in Eq. ([7| by the event 
acceptance c(Er), and integrating from some threshold 
energy E t hr to some maximum energy E max . Assuming 
that the acceptance is not energy- dependent, c(Er) sim- 
ply falls out of the integral, and becomes a mean effective 
exposure e e ff (which is the product of the detector mass 
and exposure time). Nr is then given by 
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(10) 



For our coverage study, we select a number of 
WIMP benchmark models, with benchmark mass and 
cross-section ranges m x = [25, 250] GeV and cr^ 1 = 
[10 -8 , 10 -10 ] pb. For each benchmark point the analysis 
is based on 10 3 mock data sets. 



B. Future direct detection experiments 

In order to assess the performance of the reconstruc- 
tion of WIMP properties from next-generation direct de- 
tection data, we will use ton-scale, low-background ver- 
sions of two current detectors. We will systematically 
investigate the constraints that data sets from these ex- 
periments can place on the WIMP properties for different 
benchmark models. 

The most stringent constraints on WIMP properties 
are currently provided by the XENON100 collaboration 
[T7] . The recently published 90% C.L. exclusion curve 
has a minimum cross-section of a^ 1 = 7.0 x 10 -9 pb at 
a WIMP mass m x = 50 GeV [17]. These constraints 
will be improved further once data from the proposed 
XENON1T experiment becomes available in 2015 [42] . 
Additionally, the DARWIN projectQis working towards a 
multi-ton scale noble liquid experiment which is expected 
to start running in 2017 and will probe spin-independent 
cross-sections down to 10 -12 pb. [43 A second promis- 
ing WIMP detection strategy is based on cryogenic de- 
tectors operating at very low temperatures, most notably 
the current CDMS-II germanium experiment [18 . The 
SuperCDMS and GEODM cryogenic germanium exper- 
iments aim to upgrade this experiment to the ton scale 
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within the next decade [44]. A second planned experi- 
ment using cryogenic detectors operating at mK temper- 
atures is EURECA^] This experiment is pushing for a 
target mass of 1 ton and will probe cross-sections down 
to 10- 10 pb. 

In this study we will use a ton-scale experiment with 
a liquid natural Xe target with average atomic mass 
131 g/mol, and a ton-scale Ge experiment with atomic 
mass 73 g/mol. The characteristics of these detectors are 
chosen to reflect projects that can realistically be built 
within the next 5-10 years; they are given in Table 
[I] Although large liquid argon experiments are also cur- 
rently under construction, we choose not to include sim- 
ulated argon data in this study, because previous studies 
have shown that germanium and xenon provide tighter 
constraints on the WIMP parameters and halo velocity 
distribution [TT] . 

For both the xenon and the germanium experiment we 
assume a threshold energy of E t hr = 10 keV and only 
consider recoil energies below 100 keV. This is a reason- 
able cut-off, given the exponential decay of the WIMP- 
nucleus recoil spectrum with energy. Studies have shown 
that resolving the exponential decay at high energies is 
important for improving parameter reconstruction [25] . 
For both experiments we assume a total cut efficiency of 
^cut = 80%. Following Ref. [TT], for the Xe experiment 
we take a fiducial detector mass of 5 tons and one year of 
operation. We assume that a percentage A^r = 50% of 
all nuclear recoils in the fiducial region are accepted, so 
that, after inclusion of the overall cut efficiency, the effec- 
tive exposure is e e ff = 2.00 ton x year. For the germanium 
experiment we adopt a fiducial detector mass of 1 ton and 
an exposure of three years. Taking into account the per- 
centage of events that survive the selection cuts 77 cut and 
the nuclear recoil acceptance for germanium Anr = 90% 
the effective exposure is e e ff = 2.16 ton x years. 

Several sources of background can induce additional 
recoil events in direct detection experiments, such as 
cosmic rays, or radioactive contaminations. Future de- 
tectors will apply a variety of advanced techniques in 
order to achieve extreme radio-purity and self-shielding 
of the detector, minimisation of cosmic ray events and 
precise determination of charge-to-light and charge-to- 
phonon ratios, in order to limit the background to < 1 
event per effective exposure. Given these prospects in 
the following we assume that backgrounds are negligible. 

We do not include the energy resolution of the detec- 
tors, as for both target materials including energy resolu- 
tion smearing has a negligible impact on the recoil rate, 
except possibly near threshold. The scenario considered 
here is therefore somewhat idealised, which means that 
the statistical uncertainties we identify are unavoidable, 
inherent to the WIMP benchmark point and target expo- 
sure, rather than a reflection of systematic uncertainties 



in detector response, backgrounds or modelling of the 
dark matter halo. 



III. STATISTICAL METHODOLOGY 

A. Mock data generation 

The data set for a direct dark matter experiment con- 
sists of the total number of observed events Nr and 
the spectrum of recoil energies {E R }, with i = 1, ..,Nr. 
The likelihood function C(0) for the WIMP parameters 
6 = {m x , dp 1 } is given by the Poisson probability of ob- 
serving Nr events, multiplied by the probabilities of each 
event of energy E l R having been drawn from the predicted 
probability distribution of event energies P(Er\0) 



C(0) 



N R (0) 



n e 



Nr 



N R \ 



(11) 



1 = 1 



Notice that in the above we have replaced the (latent, 
unobserved) true recoil energy E l R by the observed value 
E l R , thus assuming that energy resolution of the detectors 
is negligible, as outlined in the previous section. Nr(6) 
can be computed from Eq. (10), using the experimental 
[l| The distribution P(E R ,0) is 



characteristics in Table 

no more than the normalized recoil spectrum 



P(E R ,t 



dR/dE R (E R ,0) 



f£"dE' R dR/dE' R (E' R ,9) 



(12) 



where the rate dR/ (IEr(Er,0) is given in Eq. Note 
that the efficiency parameter e e ff drops out in the one- 
event likelihood because we assume that this function is 
independent of recoil energy. For both the Xe and the 
Ge target the integration limits are £? m in = 10 keV and 
^ max = 100 keV. As explained in the previous section 
no background events are included in Nr, as we assume 
the background to be negligible. The so-called unbinned 
likelihood function in Eq. ( 11 ) has been employed by both 
the XENON and the CDMS collaborations 03 [46]. The 
likelihood function for the combined data set of our two 
toy experiments is given by the product of the individual 
likelihood functions, each found from Eq. (11). 



The mock data sets for the experiments are generated 
as follows. First, the measured total number of counts 
Nr is drawn from a Poisson distribution with mean equal 
to the benchmark number of counts Nr. Then, val- 
ues for the measured recoil energies {E R }, i = 1,..,7V# 
are drawn from the differential event rate dR/dEji(ER), 
given in Eq. (|7|), for the benchmark value of the param- 
eters. 



B. Parameter reconstruction technique 
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We employ Bayesian methods to scan over the param- 
eter space and reconstruct the WIMP properties, see [47] 
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Target 


Ethr [keV] 


e [ton x year] 


Anr 


e e ff [tonxyr] 


# Background events 


Xe 


10.0 


5.00 


0.5 


2.00 


< 1 


Ge 


10.0 


3.00 


0.9 


2.16 


< 1 



TABLE I. Primary characteristics of future ton -scale dark matter direct detection experiments using xenon and germanium as 
target materials. For further details see section [II B| 



for further details. The cornerstone of Bayesian parame- 
ter inference is Bayes' theorem 



P(0\d) = 



P {d) 



(13) 



where p(6\d) is the posterior probability density func- 
tion (pdf), C(0) is the likelihood function and p{0) is 
the prior distribution on the parameters. The evidence 
is given by p(d), which in the context of parameter in- 
ference acts as a normalisation constant and will not 
be of interest in the following. There are two possible 
ways of looking at parameter inference: either in the 
Bayesian context (where the posterior pdf is the rele- 
vant quantity) or in the frequentist framework (where 
the likelihood function or a related test statistic is con- 
sidered). In this work, we will use Bayesian Markov 
Chain Monte Carlo (MCMC) techniques to obtain sam- 
ples from the posterior pdf of Eq. ( [13] ) , but we will also 
use these samples to map the likelihood function in the 
parameter space of interest, here the WIMP mass and the 
WIMP-proton spin-independent scattering cross-section, 
6 = {m x , dp 1 }. In order to sample from the posterior dis- 
tribution on these parameters, we have to specify their 
prior pdf p(0). Without assuming a specific underlying 
WIMP model there are no a priori constraints on m x and 
dp 1 . Therefore, we choose uniform priors on the log of 
both the WIMP mass and cross-section, reflecting igno- 
rance on their order of magnitude. The mass prior range 
is fixed to 1 < log 10 (ra x /GeV) < 3. The range of the 
cross-section prior is chosen to span two orders of mag- 
nitude around the benchmark cross-section. We extend 
this range where required, to avoid regions of high pos- 
terior probability density touching the prior boundary. 

Because the likelihood function is unimodal and well- 
behaved, and the parameter space is of low dimension- 
ality (D = 2), we can efficiently sample the posterior 
pdf using MCMC methods and use the ensuing samples 
to map out the likelihood function in a quasi-frequentist 
sense (see [48] for a detailed study of profile likelihood 
evaluation using Bayesian techniques in the context of su- 
persymmetric models). To this end, we use a Metropolis- 
Hastings algorithm [49 , 50 to generate a "chain" of sam- 
ples from the posterior pdf. As our proposal distribution 
we take a two-dimensional Gaussian centred on the pre- 
vious point in the chain; its covariance matrix is chosen 
according to earlier test runs. For some of the benchmark 
points we consider, the shape of the posterior distribution 
can vary strongly because of statistical fluctuations in the 
data realisation. In these cases, to achieve an efficient and 
complete sampling of the posterior we adopt a mixture 



strategy MCMC: our proposal distribution is a mixture 
of two different two-dimensional Gaussians, whose co- 
variance matrices are chosen (from earlier test runs) to 
match the two very different shapes of the posterior dis- 
tribution that can arise from the same benchmark model 
due to statistical fluctuations in the data ( "good" recon- 
structions and "bad" reconstructions, to be denned more 
precisely below). Every third proposal of the MCMC 
is not drawn from this Gaussian mixture, but instead 
is taken in a random direction, with a step size tuned 
to achieve an acceptable efficiency, in order to protect 
against under-exploration of the tails of the posterior. 

Each Markov chain contains a minimum number TV = 
3 x 10 5 samples; this ensures high enough statistics for a 
successful coverage investigation. Some benchmark mod- 
els lead to a very spread-out posterior distribution. In 
these cases we further increased the number of points in 
the chains, up to a maximum of TV = 5 x 10 5 points. 
We discarded the initial 10 4 samples of each chain (the 
so-called "burn-in"). We checked that this is sufficient 
to ensure that the resulting distribution is independent 
of the starting point of the MCMC and that the results 
of our analysis are stable when the length of the chains 
is doubled. Finally, we tested our MCMC method on 
toy models with known analytic posterior distributions, 
in order to verify its suitability and numerical stability. 



C. Coverage 

There are two ways of reporting inferences: x% cred- 
ible intervals (Bayesian) contain a fraction x of the pos- 
terior probability; they express the posterior degree of 
belief about the value of the parameter considered af- 
ter the data and any prior information have been taken 
into account. An x% confidence interval (Frequentist) is 
built from the likelihood function alone, and, ideally, it 
ought to contain ("cover") the true value of the param- 
eter x% of the time, when repeatedly applied to mock 
data generated from those true parameter values. This 
requirement leads to the concept of "coverage" . Coverage 
is an inherently frequentist concept, and it is not neces- 
sarily of concern to Bayesian statistics, although reliable 
behaviour of Bayesian credible intervals under repeated 
sampling is arguably also a desirable property. In the fol- 
lowing, we will mainly focus on evaluating the coverage 
and other statistical properties of (frequentist) confidence 
intervals, for the reasons outlined below. 

The profile likelihood test statistic for a point X 
in some TV-dimensional subspace Oat of the full M- 
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dimensional parameter space 9m (i.e. X G 6jv C 6m), 



is 



X(X) = -2 In 



c[x,e M - N (x)] 

^ma.x 



(14) 



Here £ max is the unconditional maximum likelihood i.e. 
the global maximum likelihood value across the entire 
M-dimensional parameter space. C[X, Qm-n(X)] is the 
conditional maximum likelihood for the given point X. 
The subspace Qm-n refers to the section of 9m that is 
not spanned by ©at. Qm-n(X) is the conditional max- 
imum likelihood estimate of the values of the parame- 
ters in Qm-n for X, i.e. the specific combination of the 
other M — N parameters that maximises the likelihood 
for the chosen X in Qjy. Confidence intervals with ex- 
act coverage can always be constructed by Monte Carlo 
evaluation of the distribution of A(X), as described in 
Ref. [27 , but in practice this may be a complicated and 
time-consuming procedure. 

Winks' theorem [26] shows that under certain regular- 



ity conditions, Eq. ( 14) converges asymptotically to a chi- 



square distribution with N degrees of freedom. Assuming 
Wilks' theorem holds, it is simple to define confidence in- 
tervals using the profile likelihood function and standard 
lookup tables for the chi-square distribution. However, in 
practice there is no guarantee that such confidence inter- 
vals will have the desired coverage properties, especially 
in cases where the likelihood function is strongly non- 
Gaussian, which leads to a lack of convergence of the test 
statistic to its asymptotic behaviour. Under- cover age 
(over-coverage) of a confidence interval means that the 
interval is too short (too large). While over-coverage is 
unnecessarily conservative, under- coverage can be a par- 
ticularly severe problem, as the true value of the param- 
eters will lie outside the stated interval a larger fraction 
of the time than its stated confidence level implies. 

In the following analysis we discuss the coverage of 
Wilks-based ID confidence intervals for the WIMP mass 
and spin-independent cross-section. The profile likeli- 
hood is constructed by binning the 2D parameter space 



({m x , <jp }), and determining the test statistics (14) in 



each bin. We then use Wilks' theorem to find the con- 
fidence level of interest. We used 750 bins in each di- 
rection of parameter space, choosing the bin size so that 
they covered the whole range spanned by the samples. 
We found that a significantly larger number of bins leads 
to large numerical noise, while a smaller number gives 
too coarse a likelihood mapping and hence artificial over- 
coverage (as tested on Gaussian toy models, for which the 
coverage is exact). 



D. Performance of parameter reconstruction 

In addition to determining how well the Wilks-based 
confidence levels cover the benchmark models, we are 
interested in estimating how well one may expect to con- 
strain WIMP properties from future direct detection data 



sets, including realisation noise. An important indicator 
is the uncertainty in the reconstructed parameters. In or- 
der to quantify this, we consider the expected fractional 
uncertainty (e.f.u.) along a direction in parameter space. 
The fractional uncertainty (f.u.) is defined as the frac- 
tional length of the 68% confidence interval relative to 
the benchmark parameter value #tme : 



f.u. = 



(15) 



The e.f.u. is the average of this quantity over 100 recon- 
structions. However, even a benchmark model with a 
small average f.u. may contain a sizeable number of re- 
constructions with a large parameter uncertainty. There- 
fore, in addition to the e.f.u. we also count the number 
of 'bad' reconstructions in 100 reconstructions. A bad 
case is defined as a reconstruction with an f.u.> 0.75, in 
which case only very limited constraints can be placed 
on the parameter in question (m x or crgi) from the data. 

The f.u. is somewhat similar to the statistical quantity 
known as effect size [5TJ [52] , which for the case of asi is 



d. 



SD 



(16) 



Here asi and SD are the mean and standard deviation, 
respectively, of a series of repeated measurements of osi- 
In our case, an equivalent role to &si an d SD are played 
by the best-fit reconstructed value of <tsi, and half the 
width of the corresponding 68% CI. This is because these 
quantities are good estimators for, respectively, the true 
value of <jsi and the standard deviation of <jsi, the ob- 
served best-fit value. The quantity asi,nuii refers to the 
value of (Jsi under the null hypothesis, i.e. the default 
situation against which the effect is being sought. In 
our case, the null hypothesis is simply that there is no 
WIMP signal, so <jsi = 0. Therefore, in the limit of zero 
bias, where the best-fit value of asi is exactly equal to 
the benchmark value, e.f.u. is approximately equivalent 
to 2d' 1 . The case of WIMP mass is less straightforward, 
as m x is undefined under the null hypothesis. 

One of the basic properties of statistical inference is 
that the power of a statistical test (its ability to avoid 
excluding a true hypothesis that differs from the null hy- 
pothesis) increases with d [52j [53J. This is simply the 
statement that larger effects can be detected more eas- 
ily. We can therefore see that the e.f.u. not only relates 
to the precision with which the WIMP mass can be re- 
constructed, but also gives some idea of the statistical 
power for detection of a WIMP with this mass. That is, 
a smaller e.f.u. indicates that a model can be detected 
more easily, so we expect the e.f.u. to roughly track the 
sensitivity of an experiment across the WIMP parameter 
space. 

We can further investigate the performance of the sta- 
tistical reconstruction by explicitly considering the bia^] 



Another useful quantity is the so-called "mean squared error" 
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for the parameters m x and a^ 1 . The statistical bias for 
a parameter is the expectation value of the difference 
between the best fit value #bf resulting from the recon- 
struction and the true value #true> i- e - 



bias = ( 6] 



7 bf ■ 



(17) 



As for the e.f.u., the expectation is taken by averaging 
the observed bias over 100 reconstructions. In the follow- 
ing we focus on the e.f.u. and bias of the reconstructed 
WIMP mass, as the performance of the reconstruction 
is expected to typically be poorer in the mass than the 
cross-section direction, due to the impact of statistical 
fluctuations on the observed recoil spectrum. 



IV. RESULTS 

A. The impact of statistical fluctuations on the 
reconstruction 

We investigate the performance of the reconstruction 
of WIMP properties for six benchmark masses m x = 
{25,35,50,70,100,250} GeV, and six spin-independent 
WIMP-proton cross-sections a^ 1 = {1.00 x 10" 8 ,3.98 x 
10~ 9 , 1.58 x 10~ 9 , 6.31 x 10- 10 , 2.51 x lO" 10 , 1.00 x lO" 10 } 
pb, thus 36 benchmark models in total. The number of 
dark matter recoil events above threshold for our Xe ex- 



periment (see section II B ) for these benchmark points is 



in the range 10 < Nr < 4000. As we focus on the case of 
a significant detection in a future experiment, we do not 
investigate the statistical properties of benchmark points 
in the very low counts regime, where Nr < 10, as it is 
hard to constrain much of anything with fewer than ~ 10 
events. 

Before we present results for our coverage study and 
the quantitative description of the performance of pa- 
rameter estimation, we show examples of good and poor 
reconstructions of WIMP parameters based on the mock 
data sets of a specific benchmark point. These examples 
illustrate points that will be important in our coverage 
and performance studies. 

Two examples of the reconstruction using Xe data are 
shown in Fig. [I] for a benchmark model with WIMP mass 
50 GeV and spin-independent WIMP-proton cross- 



section a^ 1 = 2.51 x 10~ 1U pb. This is an example of a 
benchmark point for which the performance of the re- 
construction can vary strongly with the mock data. We 
show on the left of Fig. [I] an example of a "good" recon- 
struction (i.e., well constrained likelihood in the m x — a^ 1 
plane), and on the right of Fig. [I] an example of a "bad" 
reconstruction (leading to an essentially unconstrained 



(MSE) for the parameters, given by the sum of the bias squared 
and the variance. We have found that the MSE behaves qualita- 
tively similarly to the e.f.u., so we do not discuss it separately. 



likelihood). For both cases we show the 68.3% and 95.4% 
likelihood contours (top) and the energy spectrum of the 
mock events (bottom), compared with the theoretical 
spectrum of the benchmark model (shown in black). 

For the first example (left) both the 68.3% and the 
95.4% confidence level spans a small range of masses 
and the benchmark point is well reconstructed. The dis- 
tribution of the observed energies agrees well with the 
true benchmark rate. In contrast, the second example 
(right) leads to confidence levels that spread over a large 
mass range; at 95.4% confidence only a lower limit on 
the WIMP mass can be inferred (note that the 95.4% 
contour does not close, but is cut off at the upper mass 
prior limit m x = 1000 GeV). The benchmark point is 
badly reconstructed mostly because of the presence of a 
relatively large number of high-energy counts at E > 40 
keV. Events with these energies are an unlikely realisa- 
tion of the benchmark WIMP spectrum, but can appear 
in the data due to statistical fluctuations. Poisson noise 
has flattened the observed energy spectrum relative to 
the predicted energy spectrum. The confidence intervals 
show "runaway" behaviour towards high mass because 
a flat energy spectrum is indicative of high masses, and 
the energy spectra for m x ^> tun are nearly identical. As 
an example, the theoretical spectrum for a WIMP model 
with m x = 250 GeV, cr^i = 6 31 x 10 -io is shown 
in red in the bottom right panel. Clearly this model is 
a better fit to the simulated events than the benchmark 
model. 

Note that this benchmark model leads to a large num- 
ber of events (Nr ~ 100), so that one would naively 
expect that statistical fluctuations in the realised spec- 
trum ought to have a minor impact. This is clearly not 
the case, as the bad reconstruction in the right panels of 
Fig. [I] shows that even with ~100 events, the parame- 
ter reconstruction can be poor. Even though we show in 
the rest of this section that this benchmark is relatively 
well-behaved — the coverage is exact for most intervals, 
the e.f.u. and bias are low, and the expected number of 
large-f.u. outliers is fairly small — there is a non-negligible 
probability that particular realisations of data sets for 
this benchmark lead to catastrophically poor WIMP pa- 
rameter reconstructions. 



B. Results from the coverage analysis 



In order to investigate the coverage results for the ID 
68.3% and 95.4% confidence intervals for m x and a^ 7 , for 
both Xe data and a combination of Xe+Ge date, we gen- 
erate 1000 mock data sets for each of the 36 benchmark 

The ID 68.3% (la) 



III 



models, as outlined in section 
and 95.4% (2a) confidence levels are constructed using 
Wilks' theorem and we count how often the true value of 
the WIMP mass and cross-section are found within the 
stated CL. We further subdivide the 1000 reconstructions 
into 10 subsets, of 100 reconstructions each, and we com- 
pute the coverage for each subset. We take the standard 
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FIG. 1. The left (right) panels show examples for a good (bad) reconstruction of the WIMP benchmark model with true values 
m x = 50 GeV, a^ 1 = 2.51 x 10 -10 . The difference is exclusively in statistical fluctuations in the simulated data. Top panels: 
68.3% and 95.4% confidence levels in the m x — a^ 1 plane; the red cross shows the true value. Bottom panel: energy spectrum 
of the mock data (yellow histogram - recall that we use an unbinned likelihood function, the counts are binned for a better 
visualization), true rate dR/dE(E) (black) and for the "bad" reconstruction an example of a rate (red) with a higher likelihood 
than the true rate. 



error of these ten values to estimate the statistical error 
of our coverage analysis, encompassing the uncertainty 
coming from finite numerical samples of the likelihood 
and the finite number of reconstructions. Although this 
statistical error on the coverage value varies mildly across 
benchmark points, it is sufficient for our purposes to use 
its average over all benchmark points. This leads to an 
estimated la error of 4.5% for the 68.3% intervals, and 
of 1.9% for the 95.4% intervals. 

We start by discussing the ID 68.3% and 95.4% con- 
fidence intervals for m x , shown in the top and bottom 
panels of Fig. |2j respectively. On the left-hand side we 



show the coverage results obtained for a Xe target, on 
the right-hand side we show results for the combined 
data set Xe+Ge. From the above estimate of the er- 
ror on the coverage, we define the coverage to be "exact" 
if it lies in the range (63.8,72.8)% and (93.5,97.3)% for 
the 68.3% and 95.4% contours, respectively. Benchmark 
points showing "exact" coverage within errors are dis- 
played in green. Coverage values > 72.8% (> 97.3%) 
correspond to over-coverage and are shown in red. Cov- 
erage values < 63.8% (< 93.5%) correspond to under- 
coverage. However, none of the benchmark points stud- 
ied here leads to under- coverage of any of the confidence 
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FIG. 2. Coverage results for the ID 68.3% (top) and 95.4% (bottom) confidence interval for the WIMP mass in the m x — a^ 1 
plane, for simulated Xe target (left) and for a combination of Xe+Ge (right). Green (red) regions show "exact" coverage 
(over-coverage), as defined in the text. Black regions correspond to a transition from exact coverage to over- cover age. No 
under-coverage is observed. Isocontours of the expected number of counts in the Xe experiment are given in black. In the 
upper left plot, the benchmark points studied are indicated by blue crosses. The 'flares' pattern seen in some points are an 
artefact of the interpolation scheme used to generate the plots. 



intervals. Benchmark points at the upper boundary of 
exact coverage or the lower boundary of over-coverage 
are displayed in black. For reference, isocontours of the 
expected number of counts Nr in a Xe experiment are 
also shown. 

For the Xe-only case, we find that most benchmark 
points lead to exact coverage of the ID 68.3% and 95.4% 
contours. For the 68.3% interval there is a region ob- 
served at high cross-sections and intermediate WIMP 
masses that borders on over- coverage; this is most likely 
the result of a statistical fluctuation. For both the 68.3% 
interval and the 95.4% interval, two regions leading to 
significant over-coverage can be identified, one at large 



7/1 v 



250 GeV, and another at small m x = 25,35 



GeV; both regions correspond to a small v^ 1 . The over- 
coverage observed in the first region is a result of the 
high-mass degeneracy (for m x ^> m^, cIR/cIEr depends 
only on a^ 7 /(/i^m x ); refer to Sec. II A). The importance 
of this effect decreases with increasing cross-section be- 
cause the slope of the energy spectrum is better resolved 
with more events, and hence is more sensitive to slight 
changes in v m i n . The high-mass degeneracy leads to a 
ID profile likelihood that can no longer be well approxi- 
mated by a Gaussian, such that the test statistic A(m x ) 
defined in Eq. ( [T4| ) starts to deviate from a chi-square 
distribution. The difference between the histogram of 
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FIG. 3. Difference between the histogram of the profile likelihood test statistic A(ra x ) from mock data sets and the value of the 
chi-square distribution with 1 degree of freedom (as predicted by Wilks' theorem) at the centre of each bin, as a function of 
A(ra x ), for two different WIMP benchmark points.. This difference quantifies the deviation from Wilks' theorem for these two 
benchmark points. For each benchmark point, 10 3 realisations of mock data sets have been used to construct this histogram. 
Errorbars assume Poisson count statistics. 



A(m x ) values from the mock data and the chi-square 
distribution with 1 degree of of freedom (as predicted 
by Wilks' theorem) is shown in Fig. [3] for a high-mass 
benchmark point suffering from over-coverage (m x — 250 
GeV, al 1 = 2.51 • 1(T 10 pb; see left-hand side of Fig. [2]). 
For comparison, we also show the same quantity for a 
benchmark point where the agreement with the predicted 
chi-square distribution is much better (m x = 50 GeV, 
dp 1 — 10 -8 pb), and whose coverage is exact to within 
errors. In contrast, for the high-mass point we observe 
significant discrepancies in the test statistics A(m x ) for 
values < 4, which explains why over-coverage is observed 
for this benchmark point. 

The over-coverage observed at small m x and 1 is a 
result of the low number of counts for this benchmark 
model. Due to the low statistics in the region of pa- 
rameter space the ID profile likelihood is no longer well 
approximated by a Gaussian, hence the asymptotic be- 
haviour of Wilks' theorem is less accurate. The deviation 
from Wilks' for these benchmark points is qualitatively 
similar to the red curve in Fig. [3J albeit less extreme. 

Coverage improves when the Ge data are added to the 
analysis, as can be seen in the right panels of Fig. [2] 
Exact coverage is obtained in most of the parameter 
space. An exception is observed at m x = 70 GeV, 



of 7 = 6.31 x 10~ 10 pb for the 95.4% plot, where slight 
over-coverage is found. Because neighbouring bench- 
mark points are exactly covered, we interpret this as 
a statistical fluctuation. Both regions of over-coverage 
identified in the Xe-only case shrink significantly when 
adding Ge data to the analysis. For both the 68.3% and 
the 95.4% interval the over-coverage at large m x is al- 
most completely eliminated, except at small a^ 1 (for the 
68.3% interval), for which the total number of expected 
events is O(10). For higher a^ 7 , over-coverage of high- 
mass benchmark models is reduced since the likelihood 
is tighter for a combined analysis of Xe+Ge. The re- 
maining over-coverage of the 95.4% interval at m x = 250 
GeV, (jp 1 = 1.58 x 10 -9 pb corresponds to a value of 
97.5%, which is just above the border of exact coverage 
at 97.3%. However, at lower masses, especially for the 
68.3% contour, over-coverage at very low cross-sections 
dp 1 « 10 -10 pb is not removed. In general, we find 
that the possibility of over-coverage remains as long as 
WIMP parameters are poorly constrained, which occurs 
most frequently for benchmark points which imply a low 
expected number of events. Both problems are resolved 
to some extent with the addition of data sets from a sec- 
ond experiment. 

We display the results of our coverage analysis for 
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FIG. 4. As in Fig. |2j but for the ID confidence intervals for <Jp . A significant improvement in the coverage when combining 
Xe+Ge is apparent. 



the ID 68.3% and 95.4% confidence intervals for cj^ 1 in 
Fig. |4j The left-hand plot shows the results for a Xe 
target, the right-hand plot shows the results for com- 
bined Xe+Ge data. In the case in which we consider the 
Xe data alone, most of the parameter space corresponds 
to exact coverage, but for both the la and the 2<j in- 
tervals a large region at high masses m x = 250 GeV is 
over-covered. For the 95.4% interval this region is spread 
over almost the entire cross-section range, and extends to 
m x = 100 GeV at low cross-sections. For the 68.3% in- 
terval a small region of over-coverage is found at interme- 
diate WIMP masses m x = 50, 70 GeV and low a^ 1 . For 
the 95.4% contour the corresponding benchmark points 
systematically show a coverage percentage at least 1% 
above the exact value of 95.4%. 

The over-coverage at large a^ 1 is a result of the high- 
mass degeneracy, analogously to what has been explained 



above for the mass. The over-coverage at intermediate 
WIMP masses can be explained using Fig. [T] Good 
reconstructions yield one dimensional profile likelihood 
functions that are approximately Gaussian, and thus lead 
to exact coverage. For bad reconstructions, the likelihood 
is spread over a larger range and thus the statement that 
dp 1 is over-covered for intermediate WIMP masses is a 
statement about the ratio of good to bad parameter fits. 
Due to low statistics resulting from the low number of 
counts the ID profile likelihood function can no longer 
be well approximated by a chi-square distribution, Wilks' 
theorem becomes less accurate and over-coverage is ob- 
served. On the other hand, the over-coverage around 50 
GeV WIMPs is not very significant, being close in magni- 
tude to the numerical uncertainty of our coverage values, 
and therefore could be interpreted as a statistical fluke. 
As with the WIMP mass, coverage improves with the 
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si r^Ki 




Coverage [%] 


ID 68.3% m x 


ID 95.4% m x 


ID 68.3% a^ 1 


ID 95.4% a^ 1 


35 


io- 10 


29 


73.3 (75.4) 


96.1 (96.3) 


69.2 (68.7) 


96.9 (95.5) 


50 


io- 10 


38 


68.3 (73.5) 


95.7 (96.3) 


73.3 (71.2) 


96.9 (96.8) 


100 


1.58 x IO" 9 


527 


70.3 (69.2) 


96.0 (95.3) 


68.9 (68.4) 


94.9 (95.6) 


250 


IO" 8 


1671 


68.0 (66.7) 


95.9 (94.9) 


69.2 (67.6) 


95.7 (95.2) 



TABLE II. Results of the coverage analysis of the ID confidence intervals for four selected benchmark points. Results for the 
Xe data alone are given, as well as for the combined analysis of Xe+Ge (in parenthesis). 



addition of data from a Ge target (right plots in Fig. [4| . 
For the 68.3% contour the over-covered region at in- 
termediate m x = 50, 70 GeV vanishes completely and 
is now exactly covered (apart from what can again be 
interpreted as a statistically non-significant fluctuation 
around 70 GeV, which appears as a 'flare' pattern in the 
figure). The over-covered region at high WIMP masses 
m x = 250 GeV shrinks significantly, but is difficult to 
eliminate at low cross-sections a^ 1 = 10 -10 pb, as dis- 
cussed above. The improvement in the coverage is even 
greater for the 2<j contour. For a combined analysis of 
data from Xe+Ge the over-coverage observed for the Xe 
target completely vanishes; the entire parameter space 
is exactly covered. The coverage results for a selected 
subset of benchmark points are shown in Table [TTJ 

Overall, our coverage analysis concludes that the ap- 
proximate confidence intervals for the studied benchmark 
points either cover exactly or over-cover the true values 
of the parameters - i.e., they are conservative. The two 
most important effects at play are the large mass de- 
generacy, and strong statistical fluctuations that are im- 
portant even for a relatively large numbers of expected 
counts (~ 100). We have shown that addition of data 
from a second target such as Ge leads to significant im- 
provement on both fronts. We point out that the ob- 
served over-coverage can in principle be remedied using 
methods such as Feldman- Cousins to build confidence in- 
tervals with guaranteed exact coverage. 

We have also investigated coverage properties of the 
credible intervals obtained from the Bayesian posterior. 
For well-reconstructed benchmark points, credible in- 
tervals are numerically identical to confidence intervals, 
since we have taken flat priors on our WIMP parame- 
ters of interest, so their coverage properties are the same. 
However, for badly reconstructed points (i.e., lying on the 
high-mass degeneracy line) the posterior is cut off at large 
masses and cross-sections by the prior range. This means 
that the ensuing ID marginal posterior and thus also the 
credible intervals become a function of the prior range 
adopted for the mass and cross section, which is clearly 
unsatisfactory (this effect has also been pointed out in 
another context by Ref. [54 ). As a consequence, the 
coverage of Bayesian credible intervals exhibits broadly 
the same trends as highlighted above for the frequen- 
tist intervals, but also shows a tendency towards under- 
coverage in some regions. As those results are however 
sensitive to the choice of prior range, we do not present 



coverage results for Bayesian credible intervals in this 
work - a thorough exploration of this issue would require 
a study of how such properties change as a function of 
the prior ranges chosen. We emphasize however that the 
prior ranges have no impact on our results for the fre- 
quent ist confidence intervals. 



Accuracy and precision of parameter 
reconstruction 



We now consider the question of the accuracy and 
precision of the parameter reconstruction. We start by 
investigating the expected fractional uncertainty (e.f.u.) 



for m x , introduced in section HID The e.f.u. quantifies 



the average fractional standard deviation of the recon- 
structed WIMP mass value and thus is a measure of the 
precision of the reconstruction. We show the e.f.u. in the 
m x — Op 1 plane in Fig. pi (notice that the upper limit 
of the colorbar is set to e~i.u.= 1.5 for display purposes, 
but this limit is surpassed in many cases) . Isocontours of 
the expected number of counts in a Xe target are shown 
in black. Isocontours of the number of "bad" cases (i.e., 
with an f.u. > 0.75) are shown in white. Considering the 
number of "bad" cases is very important, since this num- 
ber quantifies the probability that, for a given WIMP 
benchmark point (that may lead to a reasonably small 
average uncertainty on m x ), the experiment results in a 
data set that leaves the WIMP mass essentially uncon- 
strained. 

High-mass benchmark points lead to a likelihood func- 
tion with a long tail in the m x — a^ 1 plane, and thus 
are expected to have a very high e.f.u.. We are most in- 
terested in the region where the transition from good to 
poor performance takes place. 

We will first discuss the e.f.u. results from Xe data only. 
As a general pattern, the larger m x and the smaller <j^ j , 
the larger the e.f.u. value for the benchmark point. We 
will discuss the e.f.u. results at high (a^ 1 = 10 -8 pb), 
intermediate (a^ 1 = 10" 9 pb) and low (cr^ 1 = 10" 10 pb) 
cross-sections. 

At high (dp 1 = 10 -8 pb) cross-sections, most bench- 
mark masses lead to a small e.f.u., and thus a small un- 
certainty in the reconstructed WIMP mass. The e.f.u. 
does not exceed 0.15 for m x < 100 GeV and is signifi- 
cantly smaller for small m x = 25, 35 GeV (e.f.u. = 0.03). 
The fraction of bad reconstructions is < 1%. However, 
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FIG. 5. Expected fractional uncertainty (e.f.u.) for the WIMP mass in the m x — a^ 1 plane, for a Xe (Xe+Ge) target in the left 
(right) plot, quantifying the precision of the mass reconstruction (low e.f.u. corresponding to better precision). Isocontours of 
the expected number of counts in the Xe experiment are given in black; isocontours of the percentage of "bad" reconstruction 
(f.u. > 0.75) are shown in white. 



even for this large cross-section and the resulting large 
number of events, Nr = 1671, the high-mass benchmark 
point m x = 250 GeV leads to an e.f.u. > 1.00. Such 
a large e.f.u. means that the WIMP mass is left essen- 
tially unconstrained by the data, and the confidence lev- 
els inhabit the region of degeneracy at high masses and 
cross-sections. 

For intermediate benchmark cross-sections (a^ 1 = 
10 -9 pb), the overall precision is quite good. For bench- 
mark masses m x < 70 GeV the e.f.u. is < 0.30 and the 
WIMP mass is well constrained. This is also reflected in 
the number of bad reconstructions: for m x < 50 GeV this 
number is < 1%; for m x = 70 GeV only a couple of bad 
cases occur for 100 reconstructions. At higher m x the 
e.f.u. increases rapidly. For example, at m x = 100 GeV 
the e.f.u. increases from 0.41 to 1.21 when decreasing the 
cross-section from a^ 1 = 1.58 x 10 -9 (corresponding to 
N = 527 events) to a^ 1 = 6.31 x 10" 10 (correspond- 
ing to TV = 210 events). Therefore, at a^ 1 — 10 -9 this 
benchmark point lies on the borderline between good and 
bad performance of the reconstruction. At cross-sections 
°l l < 10~ 9 and high WIMP masses (m x > 100 GeV), 
the e.f.u. is systematically >0.75 (sometimes ^>0.75), 
meaning that the WIMP mass becomes essentially un- 
constrained in 20% or more of the reconstructions. This 
is to be expected, due to the m x — cj^ 1 degeneracy that 
occurs at high masses. However, it is interesting to see 
how pronounced this effect is even at a relatively small 
mass (m x « 100 GeV). 

The situation deteriorates significantly for a^ 1 = 
10 -10 pb, leading to a small number of counts [(9(10)] 
for all m x . This is reflected in the e.f.u., which is of or- 
der ~0.50 for small m x = 25, 35 GeV. This corresponds 



to weak constraints on the WIMP mass, and leads to 
an average uncertainty of more than 100% for m x > 50 
GeV. Similarly, while for small WIMP masses just above 
5% of all reconstructions are bad, this number is signifi- 
cantly higher for high-mass WIMP models. Even for an 
intermediate m x = 50 GeV, ~30% of reconstructions are 
bad. We emphasize once more that this is due to statisti- 
cal fluctuations in the realization of the energy spectrum, 
and therefore an unavoidable effect. 

As expected, the e.f.u. improves considerably with the 
addition of data from a Ge target. For fixed cross section, 
the 30% bad reconstruction isocontour shifts to higher 
mass values by ~ 50% with respect to the reconstruction 
with Xe data alone. Because the e.f.u. is correlated with 
the percentage of poor reconstructions, we also see that it 
decreases dramatically at fixed WIMP parameters (often 
by > 50%) with the inclusion of the Ge data. 

Fig. [6] shows the value of the e.f.u. as a function of the 
exposure e for a WIMP with cross-section a^ 1 = 10 -9 
pb and for three different benchmark masses. Solid lines 
correspond to the e.f.u. from a Xe target only, dashed 
lines show results for combining data from a Xe and a Ge 
experiment. For the Xe only case, for massive WIMPs 
(m x = 250 GeV), the expected fractional uncertainty is 
always greater than unity, as a consequence of the de- 
generacy. For intermediate (m x = 50 GeV) and small 
mass WIMPs (m x = 25 GeV), the e.f.u. drops sharply 
with increasing exposure. In particular, it is still of or- 
der ~ 30 — 40% for an exposure of 1 ton x year, and it 
is reduced to less than 10% for a Xe experiment with 
exposure 10 ton x year. When combining Xe + Ge data 
the situation improves for all benchmark masses. For 
massive WIMPs (m x = 250 GeV) an e.f.u. smaller than 
unity can be achieved for a Xe experiment with expo- 
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FIG. 6. Expected fractional uncertainty (e.f.u.) on the WIMP mass as a function of exposure for a Xenon experiment (bottom 
axis) and a Germanium experiment (top axis) required to achieve this e.f.u. for a WIMP with cross-section a^ 1 = 10 -9 , for 
three different benchmark masses m x — 25 GeV (red), m x — 50 GeV (black) and m x — 250 GeV (blue). Solid lines correspond 
to e.f.u. results for Xe only, dashed lines correspond to e.f.u. results for a Xe + Ge target. 



sure ~ 20 ton x year and a Ge experiment with exposure 
~ 10 ton x year. For larger exposures the e.f.u. further 
decreases. For both intermediate (m x = 50 GeV) and 
small (m x = 25 GeV) WIMP masses the e.f.u. for Xe 
+ Ge is significantly smaller than in the Xe only case. 
The e.f.u. strongly decreases as the exposures of the Xe 
and Ge targets are increased. In particular, for an in- 
termediate (low) mass WIMP an expected fractional un- 
certainty of less than 10% can be achieved for a 3 (1.5) 
ton x year exposure for Ge and a 5 (3) ton x year expo- 
sure for Xe. These trends are qualitatively consistent 
with those found by Refs. [55} [56]. 

However, we caution that the e.f.u. will be higher in 
reality for a fixed exposure and benchmark point, because 
of astrophysical and nuclear-physics uncertainties. 

The fractional mass bias in the m x — a^ 1 plane for a Xe 
target (Xe and Ge target) is displayed on the left (right) 
of Fig. [7] Almost no negative bias in the mass is ob- 
served. If a bias exists, it typically goes in the direction 
of a larger m x than the true value, as a consequence of the 
high mass-cross section degeneracy. In fact, the distribu- 
tion of reconstructions that reach up onto the degeneracy 
curve explains the features of Fig. [7J In comparing Figs. 
[5] and [7] we find that the curve for e.f.u. = 0.8 corre- 
sponds closely to the curve of bias = 0.2. When a large 
fraction of reconstructions are bad, both the e.f.u. and 
bias increase because the high mass-cross section curve 



becomes populated with high-likelihood fits. The exten- 
sion of the confidence levels to this region of the parame- 
ter space means that the best-fit mass is typically higher 
than the true mass, so that both the uncertainty in the 
mass and its bias become large. 

The performance of the statistical reconstruction (as 
quantified by the e.f.u., the number of bad cases and the 
fractional bias in the WIMP mass) is summarised for four 
benchmark points in Table |III| 



D. Comparison with other coverage studies 

We have focused on reconstructing phenomenological 
WIMP-related variables (mass, spin-independent cross 
section) rather than theoretical parameters in specific 
theories for WIMP physics. Perhaps not surprisingly, 
our results differ from recent studies of the coverage 
properties of parameters of specific supersymmetric mod- 
els from particle-physics experiments, including direct- 
detection data [28j [29] . Ref. [29] found that supersym- 
metric parameters were consistently over-covered when 
attempting to reconstruct the 'SU3' benchmark point 
with mock ATLAS data on sparticle masses and mass 
splittings. In contrast, consistent (and sometimes dras- 
tic) under- coverage was observed [28 for two different 
benchmark points reconstructed using mock ton-scale di- 
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.e. the bias of the WIMP mass relative to the benchmark 



m x [GeV] 




N R 


e.f.u. 


# bad cases 


fractional bias for m x 


35 


io- 10 


29 


0.51 (0.29) 


7(0) 


0.042 (0.023) 


50 


io- 10 


38 


1.24 (0.40) 


32 (4) 


0.272 (0.017) 


100 


1.58 x 10 -9 


527 


0.41 (0.22) 


9(0) 


0.014 (-0.020) 


250 


io- 8 


1671 


1.20 (0.48) 


51 (13) 


0.205 (0.052) 



TABLE III. Summary of the performance of the statistical reconstruction four selected WIMP benchmark models. The 
benchmark (true) mass and cross-section and the corresponding number of counts for the Xe experiment are shown. We give 



the expected fractional uncertainty, the number of "bad" (f.u. 
alone and for the combined analysis of Xe+Ge (in parenthesis). 



> 0.75) cases and the fractional bias in m x for the Xe data 



rect detection data. 

Here, we observed exact coverage in a large portion of 
the phenomenological parameter space we investigated. 
Unlike in supersymmetric analyses, the parameter space 
considered here does not include complicated theoreti- 
cal boundaries where the likelihood function is not de- 
fined. Substantial over-coverage is therefore not ex- 
pected in our results for cases with reasonable statistics 
(i.e. where Winks' Theorem does not break down sim- 
ply due to low- number statistics). Furthermore, the re- 
lationship between parameters of interest (here, WIMP 
mass and cross-section) and observables (i.e., counts) is 
far simpler here than when one works with fundamental 
supersymmetric parameters (which are connected to ob- 
servables via complex, non-linear Renormalization Group 
Equations that make the likelihood function highly non- 
Gaussian in the parameters). Therefore, sampling issues 
that might plague supersymmetric parameter spaces and 
lead to under- cover age are not observed in our setup. 

Taking the results of all three studies together, we con- 
clude that coverage properties are good when the scan- 
ning is done over a set of parameters that have a simple 
mapping to the observables (as was seen in [29]). As the 
observables on which a (typically approximately Gaus- 



sian) likelihood function is denned become a highly com- 
plicated function (i.e. via highly non-linear transforma- 
tions) of the parameters of interest, the coverage becomes 
less exact, and a detailed numerical investigation is re- 
quired to establish the coverage properties. The upshot 
of this for dark matter searches is that simple model- 
independent analyses using phenomenological particle- 
physics parameters for WIMPs can generally be expected 
to have good coverage, but the mapping onto specific 
model spaces will typically not retain this property. 



V. CONCLUSIONS 

We have studied the statistical properties of approx- 
imate confidence intervals on WIMP parameters, using 
mock data from future ton-scale direct detection exper- 
iments. We have focused in particular on the effect of 
unavoidable statistical fluctuations in the data. Con- 
trary to what has been observed in GUT-scale SUSY 
par ameterisat ions, we see that coverage for phenomeno- 
logical WIMP parameters (mass, cross-section) is gener- 
ally quite good. We have observed a small amount of 
over-coverage for certain benchmark points, i.e. the con- 



16 



structed confidence intervals are conservative. We have 
traced this over-coverage back to either statistical fluc- 
tuations, which become most important for benchmark 
points leading to a low expected number of counts, or 
to the degeneracy between the WIMP mass and cross- 
section, that occurs at large WIMP masses in the like- 
lihood function. In both cases the profile likelihood is 
not well approximated by a Gaussian, such that Wilks' 
theorem no longer accurately described the behaviour of 
the test statistics A(m x ) and A(asi). This problem is 
much less severe than in the SUSY case; in general, it ap- 
pears that the less complicated and nonlinear a function 
the likelihood is of the underlying parameter space, the 
better the coverage properties. Finally, we remind the 
reader that coverage issues can in principle be resolved 
altogether by constructing intervals that have exact cov- 
erage, e.g. by using the Feldman- Cousins method. 

We have found that the statistical bias and expected 
fractional uncertainty of the reconstructed WIMP mass 
and cross- section are more serious problems, which can- 
not be resolved by employing a different method of con- 
structing confidence intervals. The parameter recon- 
struction can be ruined by statistical fluctuations that 
flatten the observed energy recoil spectrum with respect 
to the true underlying model, leading to an essentially un- 
constrained likelihood function, so that only a lower limit 
can be placed on the WIMP mass and cross-section. This 
was found to be important even at intermediate WIMP 
masses and cross-sections. Therefore, even for bench- 
mark models leading to a relative large expected number 
of counts (> (9(100)), statistical fluctuations can result 
in a strong bias (i.e. low accuracy) and a low precision 
of the reconstruction of the WIMP parameters. 

We have shown that a combination of data sets from 
two independent experiments with different target ma- 
terials can significantly improve the coverage properties, 



reduce the bias and increase the accuracy and precision of 
the reconstruction. Furthermore, we have shown that the 
precision of the reconstruction can be improved consid- 
erably if the exposure of the experiment (s) is increased. 

Our investigation has assumed negligible backgrounds 
and fixed important sources of uncertainties, such as as- 
trophysical quantities describing the local dark matter 
distribution. Our modelling of the experimental like- 
lihood was correspondingly simplified. Therefore, the 
large bias and low precision of the reconstructed param- 
eters discovered for a number of benchmark models is a 
fundamental result of statistical fluctuations in the reali- 
sation of the energy spectrum. We expect that including 
the energy resolution, non-negligible backgrounds and as- 
trophysical uncertainties in the analysis would further 
degrade the performance of the reconstruction. 
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